278 research outputs found

    Improving Middleware Performance with AdOC: an Adaptive Online Compression Library for Data Transfer

    Get PDF
    http://csdl2.computer.org/In this article, we present the AdOC (Adaptive Online Compression) library. It is a user-level set of functions that enables data transmission with compression. The compression is performed dynamically during the transmission and the compression level is constantly adapted according to the environment. In order to ease the integration of AdOC into existing software the API is very close to the read and write UNIX system calls and respects their semantic. Moreover this library is thread-safe and is ported to many UNIXlike systems. We have tested AdOC under various conditions and with various data types. Results show that the library outperforms the POSIX read/write system calls on a broad range of networks (up to 100 Mbit LAN), whereas on Gbit Ethernet, it provides similar performance

    Symbolic Mapping and Allocation for the Cholesky Factorization on NUMA machines: Results and Optimizations

    Get PDF
    International audienceWe discuss some performance issues of the tiled Cholesky factorization on non-uniform memory access-time (NUMA) shared memory machines. We show how to optimize thread and data placement in order to achieve performance gains up to 50% compared to state-of- the-art libraries such as PLASMA or MKL

    Adaptive Online Data Compression

    Get PDF
    Quickly transmitting huge data in the context of distributed computing on wide area network can be achieved by compressing data before transmission. However, such an approach is not efficient when dealing with high-speed networks. Indeed, the time to compress a large file and to send it is greater than the time to send the uncompressed file. In this paper, we propose an algorithm that allows to overlap communications with compression and to adapt the compression ratio according to the network speed (the slower the network, the more we use efficient and slow compression algorithms). The advantage of such an adaptive algorithm is its generality and that its suitability for a large set of applications

    New Dynamic Heuristics in the Client-Agent-Server Model

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceMCT is a widely used heuristic for scheduling tasks onto grid platforms. However, when dealing with many tasks, MCT tends to dramatically delay already mapped task completion time, while scheduling a new task. In this paper we propose heuristics based on two features: the historical trace manager that simulates the environment and the perturbation that defines the impact a new allocated task has on already mapped tasks. Our simulations and experiments on a real environment show that the proposed heuristics outperform MCT

    DKPN: A Composite Dataflow/Kahn Process Networks Execution Model

    Get PDF
    International audienceTo address the high level of dynamism and variability in modern streaming applications (e.g. video decoding) as well as the difficulties in programming heterogeneous MPSoCs, we propose a novel execution model based upon both dataflow and Kahn process networks. This paper presents the semantics and properties of this hierarchical and parametric model, called DKPN. Parameters are classified and it is shown that hints can be derived to improve the execution. A scheduler framework and policies to back the model are also exposed. Experiments illustrate the benefits of our approach

    On the complexity of task graph scheduling with transient and fail-stop failures

    Get PDF
    This paper deals with the complexity of task graph scheduling with transient and fail-stop failures. While computing the reliability of a given schedule is easy in the absence of task replication, the problem becomes much more difïŹcult when task replication is used. Our main result is that this problem is #P'- Complete (hence at least as hard as NP-Complete problems), with both transient and fails-stop processor failures. We also study the complexity of a restricted class of schedules, where a task cannot be scheduled before all replicas of all its predecessors have completed their execution

    Affinité entre les processus, métriques et impact sur les performances : étude expérimentale

    Get PDF
    Process placement, also called topology mapping, is a well-known strategy to improve parallel program execution by reducing the communication cost between processes. It requires two inputs: the topology of the target machine and a measure of the affinity between processes. In the literature, the dominant affinity measure is the communication matrix that describes the amount of communication between processes. The goal of this paper is to study the accuracy of the communication matrix as a measure of affinity. We have done an extensive set of tests with two fat-tree machines and a 3d-torus machine to evaluate several hypotheses that are often made in the literature and to discuss their validity. First, we check the correlation between algorithmic metrics and the performance of the application. Then, we check whether a good generic process placement algorithm never degrades performance. And finally, we see whether the structure of the communication matrix can be used to predict gain.Le placement de processus en prenant en compte la topologie de la machine est unetechnique bien connue pour rĂ©duire le temps d’exĂ©cution d’un programme parallĂšle en diminuantle coĂ»t des communications entre les processus. Il nĂ©cessite deux entrĂ©es : la topologie de lamachine cible, et une mesure de l’affinitĂ© entre les processus. Dans la littĂ©rature, la mesured’affinitĂ© qui prĂ©domine est la matrice de communication qui comptabilise les communicationsentre les processus. Le but de ce papier est d’étudier la pertinence de la matrice de communicationcomme mesure de l’affinitĂ©. Dans ce but, nous avons rĂ©alisĂ© un grand nombre de tests sur unemachine de type fat-tree ainsi que sur un tore 3d, afin d’évaluer plusieurs hypothĂšse qui seretrouvent souvent dans la littĂ©rature et de discuter de leur validitĂ©. Pour cela, d’abord nousvĂ©rifions la corrĂ©lation entre des mĂ©triques algorithmiques et la performance de l’application.Ensuite, nous contrĂŽlons qu’un bon algorithme de placement n’implique jamais une dĂ©gradationdes performances d’une application. Et finalement, nous Ă©tudions la structure de la matrice decommunication dans le but de voir si elle peut ĂȘtre utilisĂ©e dans la prĂ©diction du gain

    Improving MPI Applications Performance on Multicore Clusters with Rank Reordering

    Get PDF
    International audienceModern hardware architectures featuring multicores and a complex memory hierarchy raise challenges that need to be addressed by parallel applications programmers. It is therefore tempting to adapt an application communication pattern to the characteristics of the underlying hardware. The MPI standard features several functions that allow the ranks of MPI processes to be reordered according to a graph attached to a newly created communicator. In this paper, we explain how the MPICH2 implementation of the MPI_Dist_graph_create function was modified to reorder the MPI process ranks to create a match between the application communication pattern and the hardware topology. The experimental results on a multicore cluster show that improvements can be achieved as long as the application communication pattern is expressed by a relevant metric

    Scheduling on the Grid : Historical Trace and Dynamic Heuristics

    Get PDF
    We present a historical trace manager and new dynamic scheduling heuristics that can be used, and are studied, in the client-agent-server model on the `grid'. These heuristics rely on the common acknowledgment of the characteristics of the tasks submitted to the agent, but also on the construction of the underlying historical trace of the different tasks submitted to each server. We study each heuristic and compare them on several metrics to an instantiation of MCT (Minimum Completion time), chosen as reference heuristic. The simulation experiments we have conducted show that they are likely to give good results when tested in a real environment
    • 

    corecore